We retrieved and curated 82 high-quality datasets from Gemma, focusing on case-control observational studies. These datasets, sourced from GEO, underwent rigorous curation, ensuring reliable gene expression data. Our criteria included a minimum of three samples per condition, no drug treatment, batch effect correction, and at least 15 differentially expressed genes (FDR < 0.2). We processed DNA-microarray data through quantile normalization and log transformation, while RNA-Seq data were obtained as log2-transformed counts per million reads. We mapped probes to Gene IDs, and for genes with multiple probes, we averaged the values. We performed quality checks and analyses, detailed in Supplementary Materials.
| GEO | Disease/Target pathway | Nr. of samples | Genome coverage | Nr. of DEGs | Batch effect |
|---|---|---|---|---|---|
| GSE10586 | Type I diabetes mellitus | 27 (12/15) | 18023 | 25 (18/7) | Corrected |
| GSE41762 | Type I diabetes mellitus | 77 (20/57) | 18094 | 500 (303/197) | Corrected |
| GSE10715 | Colorectal cancer | 30 (19/11) | 18210 | 500 (244/256) | Corrected |
| GSE13067 | Colorectal cancer | 72 (11/61) | 18722 | 500 (279/221) | Corrected |
| GSE31737 | Colorectal cancer | 79 (40/39) | 15188 | 500 (292/208) | Corrected |
| GSE4107 | Colorectal cancer | 22 (12/10) | 18618 | 500 (297/203) | Corrected |
| GSE49355 | Colorectal cancer | 56 (39/17) | 11233 | 500 (192/308) | Corrected |
| GSE50117 | Colorectal cancer | 18 (9/9) | 10100 | 500 (311/189) | Not detected |
| GSE10810 | Breast cancer | 58 (31/27) | 18543 | 500 (77/423) | Corrected |
| GSE26304 | Breast cancer | 115 (109/6) | 16638 | 500 (107/393) | Corrected |
| GSE10927 | Renal cell carcinoma | 64 (54/10) | 18392 | 500 (94/406) | Corrected |
| GSE15641 | Renal cell carcinoma | 92 (69/23) | 11227 | 500 (103/397) | Corrected |
| GSE33371 | Renal cell carcinoma | 64 (54/10) | 18392 | 500 (94/406) | Corrected |
| GSE53757 | Renal cell carcinoma | 143 (71/72) | 18268 | 500 (173/327) | Corrected |
| GSE11682 | Prostate cancer | 33 (17/16) | 19547 | 59 (28/31) | Corrected |
| GSE22260 | Prostate cancer | 30 (20/10) | 15837 | 500 (329/171) | Not detected |
| GSE30521 | Prostate cancer | 22 (17/5) | 15150 | 500 (292/208) | Corrected |
| GSE12643 | Type II diabetes mellitus | 20 (10/10) | 7513 | 62 (49/13) | Corrected |
| GSE13760 | Type II diabetes mellitus | 21 (10/11) | 10994 | 29 (10/19) | Corrected |
| GSE15653 | Type II diabetes mellitus | 18 (13/5) | 11157 | 500 (303/197) | Corrected |
| GSE20966 | Type II diabetes mellitus | 20 (10/10) | 18819 | 240 (111/129) | Corrected |
| GSE21340 | Type II diabetes mellitus | 20 (5/15) | 3499 | 444 (176/268) | Corrected |
| GSE38642 | Type II diabetes mellitus | 63 (9/54) | 18105 | 500 (189/311) | Corrected |
| GSE40234 | Type II diabetes mellitus | 62 (34/28) | 19098 | 452 (202/250) | Corrected |
| GSE12685 | Alzheimer disease | 13 (6/7) | 11141 | 500 (222/278) | Corrected |
| GSE28146 | Alzheimer disease | 29 (21/8) | 17912 | 500 (171/329) | Corrected |
| GSE36980 | Alzheimer disease | 79 (32/47) | 17837 | 500 (141/359) | Corrected |
| GSE37263 | Alzheimer disease | 16 (8/8) | 15133 | 500 (169/331) | Corrected |
| GSE39420 | Alzheimer disease | 21 (14/7) | 17947 | 500 (162/338) | Not detected |
| GSE4757 | Alzheimer disease | 20 (10/10) | 18801 | 94 (65/29) | Corrected |
| GSE95587 | Alzheimer disease | 117 (84/33) | 18297 | 500 (211/289) | Not detected |
| GSE97760 | Alzheimer disease | 19 (9/10) | 10779 | 500 (397/103) | Not detected |
| GSE14580 | Inflammatory bowel disease | 14 (8/6) | 18612 | 500 (273/227) | Corrected |
| GSE22619 | Inflammatory bowel disease | 20 (10/10) | 18819 | 500 (201/299) | Corrected |
| GSE36807 | Inflammatory bowel disease | 35 (28/7) | 18554 | 500 (193/307) | Not detected |
| GSE14858 | Acute myeloid leukemia | 39 (20/19) | 18337 | 500 (293/207) | Corrected |
| GSE15605 | Melanoma | 74 (58/16) | 18045 | 500 (145/355) | Not detected |
| GSE16499 | Dilated cardiomyopathy | 30 (15/15) | 15095 | 500 (218/282) | Corrected |
| GSE3586 | Dilated cardiomyopathy | 28 (13/15) | 3765 | 500 (231/269) | Not detected |
| GSE42955 | Dilated cardiomyopathy | 29 (24/5) | 17854 | 500 (169/331) | Corrected |
| GSE16515 | Pancreatic cancer | 52 (36/16) | 18361 | 500 (374/126) | Corrected |
| GSE18670 | Pancreatic cancer | 23 (11/12) | 18656 | 320 (187/133) | Corrected |
| GSE23397 | Pancreatic cancer | 21 (15/6) | 15168 | 500 (218/282) | Corrected |
| GSE28735 | Pancreatic cancer | 90 (45/45) | 18141 | 500 (378/122) | Not detected |
| GSE42952 | Pancreatic cancer | 23 (11/12) | 18655 | 500 (131/369) | Not detected |
| GSE18838 | Parkinson disease | 28 (17/11) | 14894 | 500 (134/366) | Corrected |
| GSE19587 | Parkinson disease | 22 (12/10) | 11219 | 500 (164/336) | Corrected |
| GSE20141 | Parkinson disease | 18 (10/8) | 17802 | 500 (452/48) | Corrected |
| GSE20146 | Parkinson disease | 19 (10/9) | 17870 | 132 (25/107) | Corrected |
| GSE20163 | Parkinson disease | 17 (8/9) | 11335 | 500 (177/323) | Corrected |
| GSE20164 | Parkinson disease | 11 (6/5) | 11239 | 195 (64/131) | Corrected |
| GSE20291 | Parkinson disease | 35 (15/20) | 11341 | 500 (227/273) | Corrected |
| GSE20292 | Parkinson disease | 29 (11/18) | 11335 | 500 (232/268) | Corrected |
| GSE20314 | Parkinson disease | 8 (4/4) | 11118 | 58 (38/20) | Corrected |
| GSE20333 | Parkinson disease | 12 (6/6) | 6756 | 298 (226/72) | Corrected |
| GSE7621 | Parkinson disease | 25 (16/9) | 18430 | 500 (204/296) | Corrected |
| GSE90514 | Parkinson disease | 8 (4/4) | 14486 | 128 (84/44) | Not detected |
| GSE18842 | Non-small cell lung cancer | 91 (46/45) | 18683 | 500 (205/295) | Corrected |
| GSE19188 | Non-small cell lung cancer | 156 (91/65) | 18484 | 500 (141/359) | Corrected |
| GSE19804 | Non-small cell lung cancer | 119 (59/60) | 18682 | 500 (129/371) | Corrected |
| GSE20189 | Non-small cell lung cancer | 162 (81/81) | 10949 | 500 (188/312) | Corrected |
| GSE21933 | Non-small cell lung cancer | 42 (21/21) | 18941 | 500 (161/339) | Not detected |
| GSE27262 | Non-small cell lung cancer | 50 (25/25) | 18369 | 500 (100/400) | Corrected |
| GSE52248 | Non-small cell lung cancer | 18 (12/6) | 15210 | 500 (153/347) | Not detected |
| GSE19187 | Asthma | 38 (27/11) | 17803 | 500 (366/134) | Corrected |
| GSE23552 | Asthma | 39 (26/13) | 15210 | 500 (290/210) | Corrected |
| GSE27011 | Asthma | 54 (36/18) | 17634 | 500 (273/227) | Corrected |
| GSE28619 | Alcoholic liver disease | 22 (15/7) | 18464 | 500 (331/169) | Not detected |
| GSE30153 | Systemic lupus erythematosus | 26 (17/9) | 18065 | 20 (5/15) | Corrected |
| GSE50635 | Systemic lupus erythematosus | 48 (32/16) | 17642 | 293 (214/79) | Corrected |
| GSE31189 | Bladder cancer | 92 (52/40) | 18649 | 120 (50/70) | Corrected |
| GSE36389 | Endometrial cancer | 19 (13/6) | 11210 | 500 (151/349) | Corrected |
| GSE38476 | Hepatocellular carcinoma | 20 (10/10) | 13131 | 500 (288/212) | Not detected |
| GSE54236 | Hepatocellular carcinoma | 160 (80/80) | 16823 | 500 (300/200) | Corrected |
| GSE40184 | Hepatitis C | 18 (10/8) | 11099 | 500 (240/260) | Corrected |
| GSE43754 | Chronic myeloid leukemia | 19 (9/10) | 15020 | 500 (308/192) | Not detected |
| GSE45516 | Huntington disease | 9 (6/3) | 18331 | 500 (339/161) | Not detected |
| GSE64810 | Huntington disease | 69 (20/49) | 16046 | 500 (338/162) | Not detected |
| GSE73655 | Huntington disease | 20 (13/7) | 19881 | 65 (39/26) | Not detected |
| GSE48850 | Thyroid cancer | 11 (6/5) | 15665 | 500 (249/251) | Not detected |
| GSE55235 | Rheumatoid arthritis | 20 (10/10) | 11131 | 500 (320/180) | Corrected |
| GSE5808 | Measles | 18 (15/3) | 11156 | 500 (135/365) | Corrected |
We introduce the Disease Pathway Network to counteract the shortcoming of the “single target pathway” approach and improve the sensitivity assessment in an unbiased way. HumanNet-XC, a comprehensive functional network of human genes, was used to analyze inter-pathway connectivity. Inter-pathway connectivity (IPC) was quantified as the sum of direct links and shared neighbors between genes in two pathways, tested for significance using subsampling. Inter-pathway overlap was assessed using the Jaccard index. Pathway pairs with BH FDR-corrected p-values < 0.05 in both tests were retained.
| Pathway name | Pathway ID | Pathway subclass (Human diseases) | 10 | 20 | 40 | ALL |
|---|---|---|---|---|---|---|
| Acute myeloid leukemia | hsa05221 | Cancer: specific types | 11 | 21 | 43 | 129 |
| Alcoholic liver disease | hsa04936 | Endocrine and metabolic disease | 11 | 21 | 41 | 64 |
| Alzheimer disease | hsa05010 | Neurodegenerative disease | 11 | 21 | 41 | 82 |
| Asthma | hsa05310 | Immune disease | 11 | 21 | 41 | 41 |
| Bladder cancer | hsa05219 | Cancer: specific types | 11 | 21 | 41 | 108 |
| Breast cancer | hsa05224 | Cancer: specific types | 11 | 21 | 42 | 109 |
| Chronic myeloid leukemia | hsa05220 | Cancer: specific types | 11 | 22 | 41 | 129 |
| Colorectal cancer | hsa05210 | Cancer: specific types | 11 | 21 | 41 | 128 |
| Dilated cardiomyopathy | hsa05414 | Cardiovascular disease | 11 | 21 | 41 | 66 |
| Endometrial cancer | hsa05213 | Cancer: specific types | 11 | 21 | 42 | 122 |
| Hepatitis C | hsa05160 | Infectious disease: viral | 12 | 21 | 41 | 128 |
| Hepatocellular carcinoma | hsa05225 | Cancer: specific types | 11 | 21 | 41 | 110 |
| Huntington disease | hsa05016 | Neurodegenerative disease | 11 | 21 | 23 | 23 |
| Inflammatory bowel disease | hsa05321 | Immune disease | 11 | 22 | 41 | 79 |
| Measles | hsa05162 | Infectious disease: viral | 11 | 21 | 41 | 113 |
| Melanoma | hsa05218 | Cancer: specific types | 11 | 21 | 41 | 114 |
| Non-small cell lung cancer | hsa05223 | Cancer: specific types | 12 | 21 | 41 | 129 |
| Pancreatic cancer | hsa05212 | Cancer: specific types | 11 | 21 | 41 | 135 |
| Parkinson disease | hsa05012 | Neurodegenerative disease | 11 | 21 | 28 | 28 |
| Prostate cancer | hsa05215 | Cancer: specific types | 11 | 22 | 42 | 115 |
| Renal cell carcinoma | hsa05211 | Cancer: specific types | 11 | 21 | 41 | 129 |
| Rheumatoid arthritis | hsa05323 | Immune disease | 11 | 21 | 41 | 67 |
| Systemic lupus erythematosus | hsa05322 | Immune disease | 11 | 21 | 39 | 39 |
| Thyroid cancer | hsa05216 | Cancer: specific types | 11 | 21 | 41 | 108 |
| Type I diabetes mellitus | hsa04940 | Endocrine and metabolic disease | 11 | 21 | 41 | 43 |
| Type II diabetes mellitus | hsa04930 | Endocrine and metabolic disease | 11 | 21 | 41 | 125 |
Min, max and median average number of tested pathways in the positive benchmark.
We assessed the EA methods’ performance on randomized data. In an ideal scenario, the method should produce a uniform distribution of p-values ranging from 0 to 1 for pathways when applied to randomized data. Ideally, 5% of these p-values would fall below the cutoff of 0.05.
Under the null hypothesis, EA methods often yield p-values that display a bias either toward 0 or 1 or exhibit a bimodal distribution skewed towards both extremes. This bias can significantly influence the significance of the analysis. Therefore, we studied the p-value distributions for each method to determine if they were right-skewed (biased toward 0) or left-skewed (biased toward 1). A right-skewed distribution (p-values biased toward 0) has the potential to produce false positives by identifying pathways as affected when they are not. Conversely, a left-skewed distribution (p-values biased toward 1) may lead to false negatives by indicating pathways as non-significant when they are indeed impacted.
In this study, we used independent positive and negative benchmarks. The positive benchmark includes true positives (TP) and false negatives (FN), representing pathways correctly identified as significant (p-value < 0.05) and non-significant (p-value ≥ 0.05), respectively. Similarly, the negative benchmark includes true negatives (TN) and false positives (FP), indicating pathways correctly identified as non-significant or significant, respectively. We created the negative benchmark by resampling gene labels on the datasets from the genome, ensuring a consistent number of differentially expressed genes (DEGs) for accurate false positive rate (FPR) estimation across tests. To address the imbalance between positive and negative pathways, we focused on target-related pathways in the negative benchmark to calculate TN and FP.
Using TP, TN, FP, and FN definitions, we derived true positive rate (TPR, or sensitivity) and true negative rate (TNR, or specificity, or 1-FPR). We computed the geometric mean of TPR and TNR (G-mean) as a comprehensive performance summary. Additionally, we assessed the median relative rank of TPs among the top predictions, considering ties by averaging the ranks.
To ensure a balanced representation of the benchmarked disease pathway subnetworks, we limited them to the top 20 linked pathways for each target disease pathway. We made this choice due to the relatively low number of total associations observed in Parkinson’s and Huntington’s diseases.
We conducted scalability tests for each method in the benchmark using KEGG as input on 82 datasets. Some methods support parallelization and can handle multiple datasets simultaneously, reducing elapsed time in a battery testing setup. The analysis was conducted on macOS Monterey (v.12.5.1) with an Apple M1 processor (16GB RAM), except for BinoX, which ran on Ubuntu (v.18.04.6) with an Intel Core i7-2600 3.40GHz processor (16GB RAM). GSEA is presented as elapsed time in the results.